-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use gossip in drone and wallet to identify leader ports #1002
Use gossip in drone and wallet to identify leader ports #1002
Conversation
That's too much code to copy-paste. A utility function in thin_client.rs, perhaps? Also, is this really what we want? On Google Cloud, will this return the public addresses a client would need or the private ones the network is using to minimize costs? |
the public/private IPs are complicated. Maybe the network should advertise many IPs, in order of preference, but a single set of ports for all the services. The clients can find whatever works for them. But the above should probably be addressed in a different PR/issue. |
src/bin/drone.rs
Outdated
crdt.write().unwrap().insert(&leader_entry_point); | ||
|
||
// Block until leader's correct contact info is received | ||
while crdt.read().unwrap().leader_data().is_none() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a sleep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Between the crdt write and read? How long and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will spin the thread without yielding to the rest of the system to actually process anything to get the leader_data. if you sleep, that tells the OS that it can do other useful things. 100ms is probably good enough.
maybe pull this out into a function that polls for X seconds and returns an error if it fails. otherwise we may end up with this stuck in a misconfigured test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recent push implements a configurable timeout for both wallet and drone gossip-poll. (Timeout can also be none, if blocking is desired.) I plugged in some starting values in the bash scripts that are reasonable for my local machine. @aeyakovenko , do you have a feel for how long is reasonable for tests/automation? Maybe a question for @mvines ?
Presumably the wallet timeout can be short, since we expect a full node to be up, but I'm not clear on how long we would expect between drone and leader boot.
6fdf076
to
76aec9b
Compare
src/thin_client.rs
Outdated
} | ||
} | ||
|
||
exit.store(true, Ordering::Relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
503fee3
to
20a0abd
Compare
Hey @rob-solana, are you keeping an eye on the CLI options from all our various apps? This PR adds |
If I read this correctly, the timeout is in the user's hands, i.e. if no -w argument is presented, the code keeps trying forever. That sounds correct to me. |
src/bin/drone.rs
Outdated
Arg::with_name("wait") | ||
.short("w") | ||
.long("wait") | ||
.value_name("NUMBER") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clap is pretty awesome... I recommend you pass something that looks like help to value_name().
when you ask for help with this code you see something like
--wait NUMBER time to wait for leader info from gossip
What units? Why does this thing need "leader info"?
Were I the user, I'd prefer something that conveyed the units and better captured semantics, e.g.:
--timeout SECS wait at most SECS seconds to get necessary gossip from the network
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some things ok, I think. some ergonomics polish might be worth the time
I agree |
20a0abd
to
2a1374d
Compare
Great suggestions! |
@aeyakovenko (or @mvines) should this PR be blocked on that public/private IP issue? |
@CriesofCarrots, crickets. Merge it at your leisure. |
- Add utility function - Add thread sleep - Enable configurable timeout for gossip poll
2a1374d
to
7d195e4
Compare
💔 Unable to automerge due to CI failure |
) * Bump eslint from 7.13.0 to 7.17.0 in /token-lending/js Bumps [eslint](https://github.com/eslint/eslint) from 7.13.0 to 7.17.0. - [Release notes](https://github.com/eslint/eslint/releases) - [Changelog](https://github.com/eslint/eslint/blob/master/CHANGELOG.md) - [Commits](eslint/eslint@v7.13.0...v7.17.0) Signed-off-by: dependabot[bot] <[email protected]> * fix: add mkdirp dep Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Justin Starry <[email protected]>
…olana-labs#1002) * replay: only vote on blocks with >= 32 data shreds in last fec set * pr feedback: pub(crate), inspect_err * pr feedback: error variants, collapse function, dedup * pr feedback: remove set_last_in_slot, rework test * pr feedback: add metric, perform check regardless of ff * pr feedback: mark block as dead rather than duplicate * pr feedback: self.meta, const_assert, no collect * pr feedback: cfg(test) assertion, remove expect and collect, error fmt * Keep the collect to preserve error * pr feedback: do not hold bank_forks lock for mark_dead_slot
Fixes #934 and unblocks #920